Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/onspd header checks and db flag #33

Merged
merged 2 commits into from
Nov 13, 2024

Conversation

GeoWill
Copy link
Contributor

@GeoWill GeoWill commented Nov 11, 2024

First commit sorts out the ONSPD header issue and does some nice checks. When it doesn't match we will get output that looks something like:

CommandError: 
Problem with the fields in /home/will/Downloads/ONSPD_MAY_2024_UK/ONSPD_MAY_2024_UK.csv
  File header: ['bua11', 'buasd11', 'calncv', 'casward', 'ced', 'ctry', 'dointr', 'doterm', 'eer', 'icb', 'imd', 'itl', 'lat', 'lep1', 'lep2', 'long', 'lsoa01', 'lsoa11', 'lsoa21', 'msoa01', 'msoa11', 'msoa21', 'nhser', 'npark', 'oa01', 'oa11', 'oa21', 'oac01', 'oac11', 'oscty', 'oseast1m', 'osgrdind', 'oshlthau', 'oslaua', 'osnrth1m', 'osward', 'parish', 'pcd', 'pcd2', 'pcds', 'pcon', 'pct', 'pfa', 'rgn', 'ru11ind', 'sicbl', 'statsward', 'streg', 'teclec', 'ttwa', 'ur01ind', 'usertype', 'wz11']
  Expected header: ['bua22', 'calncv', 'casward', 'ced', 'ctry', 'dointr', 'doterm', 'eer', 'icb', 'imd', 'itl', 'lat', 'lep1', 'lep2', 'long', 'lsoa01', 'lsoa11', 'lsoa21', 'msoa01', 'msoa11', 'msoa21', 'nhser', 'npark', 'oa01', 'oa11', 'oa21', 'oac01', 'oac11', 'oscty', 'oseast1m', 'osgrdind', 'oshlthau', 'oslaua', 'osnrth1m', 'osward', 'parish', 'pcd', 'pcd2', 'pcds', 'pcon', 'pct', 'pfa', 'rgn', 'ru11ind', 'sicbl', 'statsward', 'streg', 'teclec', 'ttwa', 'ur01ind', 'usertype', 'wz11']
  Fields missing from file:
  - bua22
  Unexpected fields found in file:
  + bua11
  + buasd11
This probably means ONSPD has changed they're csv format and we need to update our import command.

The second commit adds a --database flag to the BaseImporter meaning that classes that inherit can target specific databases. I've added some tests for this, because it's easy not to use the BaseImporter's cursor (i.e. self.cursor).

To test locally you can:

  • got to (eg) everyelection and pip install -e /path/to/uk-geo-utils/
  • Duplcate your existing db createdb -T every_election every_election_20241111_test
  • add this to your local.py DATABASES setting:
        "other": {
          "ENGINE": "django.contrib.gis.db.backends.postgis",
          "NAME": "every_election_20241111_test",
          "USER": "every_election",
          "PASSWORD": "",
          "HOST": "",
          "PORT": "",
      },
    
  • This should fail: python manage.py import_onspd --data-path /home/will/Downloads/ONSPD_AUG_2024/Data/
  • migrate python manage.py migrate. Now the above command should succeed
  • This will fail: python manage.py import_onspd --data-path /home/will/Downloads/ONSPD_AUG_2024/Data/ --database other
  • python manage.py migrate --database other Now it should succeed.

Between the above commands you can check where the data is being imported with select count(*) from uk_geo_utils_onspd; in the relevant dbs.

@GeoWill GeoWill force-pushed the feature/onspd-header-checks-and-db-flag branch from 55a1ada to 05992f7 Compare November 11, 2024 11:01
@GeoWill GeoWill requested a review from chris48s November 11, 2024 11:02
@coveralls
Copy link

coveralls commented Nov 11, 2024

Pull Request Test Coverage Report for Build c7bb37cf-7e7b-422f-a9da-cef3986fe33f

Details

  • 46 of 46 (100.0%) changed or added relevant lines in 4 files are covered.
  • 11 unchanged lines in 1 file lost coverage.
  • Overall coverage decreased (-0.5%) to 90.629%

Files with Coverage Reduction New Missed Lines %
uk_geo_utils/base_importer.py 11 79.06%
Totals Coverage Status
Change from base Build f52042f9-6227-41e3-9fca-3874c024013f: -0.5%
Covered Lines: 706
Relevant Lines: 779

💛 - Coveralls

Comment on lines 18 to 21
def __init__(
self, stdout=None, stderr=None, no_color=False, force_color=False
):
super().__init__(stdout, stderr, no_color, force_color)
self.derived_fields = ["location"]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given we don't do anything with stdout, stderr, no_color or force_color in this function, you could make this:

def __init__(self, *args, **kwargs):
    super().__init__(*args, **kwargs)
    self.derived_fields = ["location"]

..and then if the function in the base class changes signature, we don't need to account for it here.

for field in sorted(unexpected_fields):
error_msg.append(f" + {field}")
error_msg.append(
"This probably means ONSPD has changed they're csv format and we need to update our import command."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they're --> their

Also, if this happens then really the thing we need to update is our model, isn't it?

cmd.stdout = StringIO() # Suppress output

# Import data to default database
opts = {"data_path": csv_path, "database": DEFAULT_DB_ALIAS}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice if we could test that this is the default behaviour if we don't explicitly supply the database paramat all. I guess calling the command like this we have to specify every param. This might be a situation where testing via call_command() might be a good plan.
https://docs.djangoproject.com/en/4.2/ref/django-admin/#running-management-commands-from-your-code
Using call_command() does prevent us from doing something like mocking out one of the internal functions of our management command class but if all we want to do is suppress/capture stdout/stderr, call_command() allows us to do this.

Comment on lines 24 to 33
def add_arguments(self, parser):
super().add_arguments(parser)

parser.add_argument(
"--header",
help="Specify which header the csv has",
default="aug2022",
choices=["may2018", "aug2022"],
)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In 963ecb3 we remove the --header flag, but then in 05992f7 we add it back in. Did you mess up resolving a merge conflict maybe?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exactly!

@chris48s
Copy link
Member

There are a few comments that need follow up but in general this is looking really good 👍

@GeoWill
Copy link
Contributor Author

GeoWill commented Nov 11, 2024

Useful comments, Thanks. I think the fixups address them.

@chris48s
Copy link
Member

In the two places where we switched to using call_command() for the tests, can we now remove the database param from opts and test that it uses DEFAULT_DB_ALIAS if we don't specify any value, or does call_command() still require us to pass a value for database ?

@GeoWill
Copy link
Contributor Author

GeoWill commented Nov 13, 2024

Yep that would make sense wouldn't it :)

This commit also brings in checks to let us know if the header is
correct. If it's not then we get a list of missing/extra fields.
This means it should be obvious next time they change it.

It will also help us catch non erroring changes like nuts -> itl
which we had missed.
@GeoWill GeoWill force-pushed the feature/onspd-header-checks-and-db-flag branch from 5b3eb95 to 6f79e36 Compare November 13, 2024 10:43
@GeoWill GeoWill merged commit c01100d into master Nov 13, 2024
5 of 6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants